| Cascading Robustness Verification: Toward Efficient Model-Agnostic CertificationCertifying neural network robustness against adversarial examples is challenging, as formal guarantees often require solving non-convex problems. Hence, incomplete verifiers are widely used because they scale efficiently and substantially reduce the cost of robustness verification compared to complete methods. However, relying on a single verifier can underestimate robustness because of loose approximations or misalignment with training methods. In this work, we propose Cascading Robustness Verification (CRV), which goes beyond an engineering improvement by exposing fundamental limitations of existing robustness metric and introducing a framework that enhances both reliability and efficiency. CRV is a model-agnostic verifier, meaning that its robustness guarantees are independent of the model's training process. The key insight behind the CRV framework is that, when using multiple verification methods, an input is certifiably robust if at least one method certifies it as robust. Rather than relying solely on a single verifier with a fixed constraint set, CRV progressively applies multiple verifiers to balance the tightness of the bound and computational cost. Starting with the least expensive method, CRV halts as soon as an input is certified as robust; otherwise, it proceeds to more expensive methods. For computationally expensive methods, we introduce a Stepwise Relaxation Algorithm (SR) that incrementally adds constraints and checks for certification at each step, thereby avoiding unnecessary computation. Our theoretical analysis demonstrates that CRV achieves equal or higher verified accuracy compared to powerful but computationally expensive incomplete verifiers in the cascade, while significantly reducing verification overhead. Empirical results confirm that CRV certifies at least as many inputs as benchmark approaches, while improving runtime efficiency by up to ~90%. Toronto Metropolitan University, McMaster University | Publication | 2026-03-23 | Maleki, M., Rushendra Sidibomma, Arman Adibi, Samavi, R. |
| Efficient Subsampling for GNN Downstream TasksWhile Graph Neural Networks (GNNs) have shown significant promise for data integration using graph structures, methods to support subsampling graph data are lagging. To address this gap, in this paper, we propose a novel importance-based data subsampling framework. This framework strategically identifies inputs from a primary graph dataset based on their impact on the model's learning of downstream tasks, such as graph or node classification. Our measure of impact is the predictive uncertainty of each data point. To ensure the subsample is well-representative of the original sample, we cluster them based on their learned graph representation. Finally, subsampling is performed from these identified clusters. The process favours selecting data points with greater prediction uncertainty, while preserving the diversity of the overall sample. We evaluate our approach using a multi-source, real-world dataset on child and youth mental health, comprising emergency department (ED) admissions and mental health questionnaire data. Our experimental results demonstrate that training a GNN with samples identified by the proposed framework yields a statistically significant improvement (on average, 10.13% improvement across metrics from the baseline approach) in predictive performance compared to training on a randomly selected subset of patients. The code is available at https://github.com/tailabTMU/GSS. Toronto Metropolitan University, McMaster University | Publication | 2025-11-24 | Daneshvar, H., Samavi, R. |
| ICN Congress, Helsinki University of Alberta | Conference | 2025-06-09 | Meherali, S. |
| Northwest SPOR Collaborative Forum. University of Alberta | Conference | 2025-05-12 | Meherali, S. |
| GNN’s Uncertainty Quantification using Self-DistillationGraph Neural Networks (GNNs) have shown remarkable performance in the healthcare domain. However, what remained challenging is quantifying the predictive uncertainty of GNNs, which is an important aspect of trustworthiness in clinical settings. While Bayesian and ensemble methods can be used to quantify uncertainty, they are computationally expensive. Additionally, the disagreement metric used by ensemble methods to compute uncertainty cannot capture the diversity of models in an ensemble network. In this paper, we propose a novel method, based on knowledge distillation, to quantify GNNs’ uncertainty more efficiently and with higher precision. We apply self-distillation, where the same network serves as both the teacher and student models, thereby avoiding the need to train several networks independently. To ensure the impact of self-distillation, we develop an uncertainty metric that captures the diverse nature of the network by assigning different weights to each GNN classifier. We experimentally evaluate the precision, performance, and ability of our approach in distinguishing out-of-distribution data on two graph datasets: MIMIC-IV and Enzymes. The evaluation results demonstrate that the proposed method can effectively capture the predictive uncertainty of the model while having performance similar to that of the MC Dropout and ensemble methods. Toronto Metropolitan University, Vector Institute, McMaster University | Publication | 2025-06-18 | Daneshvar, H., Samavi, R. |
| SACP : Spatially-Adaptive Conformal Prediction in Uncertainty Quantification of Medical Image SegmentationWhile Conformal Prediction provides statistical coverage guarantees, existing non-conformity measures fail to account for spatially varying importance of predictive uncertainty in medical image segmentation. In this paper, we incorporate spatial context near critical interfaces such as a vessel or critical organ in medical image segmentation. Our framework consists of three key components: (1) a base non-conformity score derived from segmentation model probabilities, (2) employing class-conditional calibration followed by a validation mechanism equipped with a distance-weighted scoring function that exponentially decays with distance from key interfaces, and (3) a prediction set construction method that preserves coverage guarantees while providing targeted uncertainty quantification in critical regions. While our approach is generalizable to different scenarios, for validation purposes, we employ tumor segmentation in pancreatic adenocarcinoma imaging from multiple medical centers. Results demonstrate that our method achieves the desired coverage levels while generating prediction sets that adaptively expand near critical interfaces. Toronto Metropolitan University, Vector Institute | Publication | 2025-03-27 | Jacqueline Isabel Bereska, Karimi, H., Samavi, R. |
| Database Competition: Migrant Integration in the Mid-21st Century: Bridging Divides McMaster University, Toronto Metropolitan University | Grant | 2025-11-20 | Samavi, R. |
| Practical Trustworthiness of Deep Neural Networks Vector Institute, McMaster University, Toronto Metropolitan University | Grant | 2025-04-01 | Samavi, R. |
| Trustworthy LLM-based Conversation Agents to Enhance Migrant Youth Mental Health Vector Institute | Grant | 2025-01-02 | Samavi, R. |
| Global AI Summit McMaster University, Toronto Metropolitan University | Conference | 2025-10-29 | Samavi, R. |
| York University Connected Minds Conference McMaster University, Toronto Metropolitan University | Conference | 2025-10-03 | Samavi, R. |
| 157. Predicting Child and Youth ED Visits With Large Language Models Toronto Metropolitan University, Vector Institute, McMaster University, University of British Columbia | Publication | 2025-04-09 | Czobit, C., Samavi, R., Daneshvar, H., Sassi, R., Laura Duncan, Ahmad Mauluddin, Judy Zhao, Paulo Pires, Thomas E Doyle |
| 1775 Using a Convolutional Neural Network to Extract the Margin Status from Free Text Pathology Reports: A Tool for Quality Assessment Vector Institute, Toronto Metropolitan University | Publication | 2025-03-01 | Michael Bonert, Kimberley Yuen, Achilleas Thoma, Samavi, R. |
| 61. Scoping Review of Deep Learning Applications in Child and Youth Mental Health Research Toronto Metropolitan University, Vector Institute, McMaster University, University of British Columbia | Publication | 2025-04-09 | Manasvi Vanama, Simran Saggu, Krysten DeSouza, Daneshvar, H., Czobit, C., Ahmad Mauluddin, Judy Zhao, Samavi, R., Thomas E Doyle, Paulo Pires, Sassi, R., Laura Duncan |
| Estimating Quality in Therapeutic Conversations: A Multi-Dimensional Natural Language Processing FrameworkEngagement between client and therapist is a critical determinant of therapeutic success. We propose a multi-dimensional natural language processing (NLP) framework that objectively classifies engagement quality in counseling sessions based on textual transcripts. Using 253 motivational interviewing transcripts (150 high-quality, 103 low-quality), we extracted 42 features across four domains: conversational dynamics, semantic similarity as topic alignment, sentiment classification, and question detection. Classifiers, including Random Forest (RF), Cat-Boost, and Support Vector Machines (SVM), were hyperparameter tuned and trained using a stratified 5-fold cross-validation and evaluated on a holdout test set. On balanced (non-augmented) data, RF achieved the highest classification accuracy (76.7%), and SVM achieved the highest AUC (85.4%). After SMOTE-Tomek augmentation, performance improved significantly: RF achieved up to 88.9% accuracy, 90.0% F1-score, and 94.6% AUC, while SVM reached 81.1% accuracy, 83.1% F1-score, and 93.6% AUC. The augmented data results reflect the potential of the framework in future larger-scale applications. Feature contribution revealed conversational dynamics and semantic similarity between clients and therapists were among the top contributors, led by words uttered by the client (mean and standard deviation). The framework was robust across the original and augmented datasets and demonstrated consistent improvements in F1 scores and recall. While currently text-based, the framework supports future multimodal extensions (e.g., vocal tone, facial affect) for more holistic assessments. This work introduces a scalable, data-driven method for evaluating engagement quality of the therapy session, offering clinicians real-time feedback to enhance the quality of both virtual and in-person therapeutic interactions. Vector Institute, McMaster University, Toronto Metropolitan University, University of Toronto | Publication | 2025-05-09 | Alice Rueda, Argyrios Perivolaris, Niloy Roy, Dylan Weston, Sarmed Shaya, Zachary Cote, Martin Ivanov, Bazen G Teferra, Yuqi Wu, Sirisha Rambhatla, Divya Sharma, Andrew Greenshaw, Rakesh Jetly, Yanbo Zhang, Bo Cao, Samavi, R., Sridhar Krishnan, Bhat, V. |
| Generative Adversarial Networks for Neuroimage TranslationImage-to-image translation has gained popularity in the medical field to transform images from one domain to another. Medical image synthesis via domain transformation is advantageous in its ability to augment an image dataset where images for a given class are limited. From the learning perspective, this process contributes to the data-oriented robustness of the model by inherently broadening the model’s exposure to more diverse visual data and enabling it to learn more generalized features. In the case of generating additional neuroimages, it is advantageous to obtain unidentifiable medical data and augment smaller annotated datasets. This study proposes the development of a cycle-consistent generative adversarial network (CycleGAN) model for translating neuroimages from one field strength to another (e.g., 3 Tesla [T] to 1.5 T). This model was compared with a model based on a deep convolutional GAN model architecture. CycleGAN was able to generate the synthetic and reconstructed images with reasonable accuracy. The mapping function from the source (3 T) to the target domain (1.5 T) performed optimally with an average peak signal-to-noise ratio value of 25.69 ± 2.49 dB and a mean absolute error value of 2106.27 ± 1218.37. The codes for this study have been made publicly available in the following GitHub repository. Toronto Metropolitan University, Vector Institute, McMaster University | Publication | 2025-05-30 | Czobit, C., Samavi, R. |
| Human vs. LLM-Based Thematic Analysis for Digital Mental Health Research: Proof-of-Concept Comparative StudyThematic analysis provides valuable insights into participants' experiences through coding and theme development, but its resource-intensive nature limits its use in large healthcare studies. Large language models (LLMs) can analyze text at scale and identify key content automatically, potentially addressing these challenges. However, their application in mental health interviews needs comparison with traditional human analysis. This study evaluates out-of-the-box and knowledge-base LLM-based thematic analysis against traditional methods using transcripts from a stress-reduction trial with healthcare workers. OpenAI's GPT-4o model was used along with the Role, Instructions, Steps, End-Goal, Narrowing (RISEN) prompt engineering framework and compared to human analysis in Dedoose. Each approach developed codes, noted saturation points, applied codes to excerpts for a subset of participants (n = 20), and synthesized data into themes. Outputs and performance metrics were compared directly. LLMs using the RISEN framework developed deductive parent codes similar to human codes, but humans excelled in inductive child code development and theme synthesis. Knowledge-based LLMs reached coding saturation with fewer transcripts (10-15) than the out-of-the-box model (15-20) and humans (90-99). The out-of-the-box LLM identified a comparable number of excerpts to human researchers, showing strong inter-rater reliability (K = 0.84), though the knowledge-based LLM produced fewer excerpts. Human excerpts were longer and involved multiple codes per excerpt, while LLMs typically applied one code. Overall, LLM-based thematic analysis proved more cost-effective but lacked the depth of human analysis. LLMs can transform qualitative analysis in mental healthcare and clinical research when combined with human oversight to balance participant perspectives and research resources. Vector Institute, McMaster University, Toronto Metropolitan University, University of Toronto | Publication | 2025-05-02 | Karisa Parkington, Bazen G Teferra, Marianne Rouleau-Tang, Argyrios Perivolaris, Alice Rueda, Adam Dubrowski, Bill Kapralos, Samavi, R., Andrew Greenshaw, Yanbo Zhang, Bo Cao, Yuqi Wu, Sirisha Rambhatla, Sridhar Krishnan, Bhat, V. |
| Leveraging large language models for automated depression screeningMental health diagnoses possess unique challenges that often lead to nuanced difficulties in managing an individual's well-being and daily functioning. Self-report questionnaires are a common practice in clinical settings to help mitigate the challenges involved in mental health disorder screening. However, these questionnaires rely on an individual's subjective response which can be influenced by various factors. Despite the advancements of Large Language Models (LLMs), quantifying self-reported experiences with natural language processing has resulted in imperfect accuracy. This project aims to demonstrate the effectiveness of zero-shot learning LLMs for screening and assessing item scales for depression using LLMs. The DAIC-WOZ is a publicly available mental health dataset that contains textual data from clinical interviews and self-report questionnaires with relevant mental health disorder labels. The RISEN prompt engineering framework was utilized to evaluate LLMs' effectiveness in predicting depression symptoms based on individual PHQ-8 items. Various LLMs, including GPT models, Llama3_8B, Cohere, and Gemini were assessed based on performance. The GPT models, especially GPT-4o, were consistently better than other LLMs (Llama3_8B, Cohere, Gemini) across all eight items of the PHQ-8 scale in accuracy (M = 75.9%), and F1 score (0.74). GPT models were able to predict PHQ-8 items related to emotional and cognitive states. Llama 3_8B demonstrated superior detection of anhedonia-related symptoms and the Cohere LLM's strength was identifying and predicting psychomotor activity symptoms. This study provides a novel outlook on the potential of LLMs for predicting self-reported questionnaire scores from textual interview data. The promising preliminary performance of the various models indicates there is potential that these models could effectively assist in the screening of depression. Further research is needed to establish a framework for which LLM can be used for specific mental health symptoms and other disorders. As well, analysis of additional datasets while fine-tuning models should be explored. Vector Institute, McMaster University, Toronto Metropolitan University, University of Toronto | Publication | 2025-07-01 | Bazen Gashaw Teferra, Argyrios Perivolaris, Wei-Ni Hsiang, Christian Kevin Sidharta, Alice Rueda, Karisa Parkington, Yuqi Wu, Anuja Soni, Samavi, R., Rakesh Jetly, Yanbo Zhang, Bo Cao, Sirisha Rambhatla, Sridhar Krishnan, Bhat, V. |
| Opinion: Mental health research: to augment or not to augmentThe integration of artificial intelligence (AI) and machine learning (ML) into healthcare is growing, with tools often limited by data scarcity and biases; issues particularly pronounced in mental health research. Data augmentation, a method of artificially expanding datasets, holds promise for addressing these challenges by creating synthetic data, improving diversity, and reducing costs. This has shown success in medical imaging, yet mental health datasets face unique barriers, including subjective measurements, privacy concerns, and underrepresentation of marginalized groups. Augmented data can help balance these datasets, enhance diagnostic accuracy, and improve generalizability, enabling more equitable AI models. However, risks such as replicating existing biases, losing cultural context, and producing clinically unreliable augmented data require careful consideration. In mental health, small variations in data can influence outcomes significantly, and poorly designed augmentation could oversimplify complex experiences. To harmonize potential with caution, augmented data should complement real-world data and be rigorously evaluated by clinicians for alignment with expertise. Ethical implications, including consent and privacy, demand careful frameworks to ensure augmented datasets are responsibly used. While data augmentation offers exciting opportunities to advance mental health research, its implementation must prioritize transparency, clinical fidelity, and equity.In recent years, there has been significant growth in the use of artificial intelligence (AI) and machine learning (ML) in healthcare. AI-based tools are increasingly used to predict diagnoses, personalize treatment plans, and assess risk factors, aiming to enable more scalable mental health care solutions. However, mental health research is often limited by the availability of highquality and large sample datasets and confounded by the multifaceted complexities of human behaviors and emotions. 1 To address this gap, researchers have begun to utilize data augmentation techniques to expand available datasets. Generating new data artificially enables models to use larger and more complete training datasets. Consider the medical imaging field, where AI has become a prominent fixture in practice. Data augmentation has demonstrated benefits across all organs and modalities to help promote medical imaging training without investing time and resources into collecting new samples. 2 However, mental health research presents many unique barriers to integrating data augmentation. Biases inherent in the original set of mental health data remain and can result in overfitting where a model is unable to make accurate predictions from any other data other than the training data. This article explores the unique challenges researchers must overcome due to the lack of representative mental health data and how these challenges interact with AI and ML advancements. We explore data augmentation as a tool to bridge this gap, offering an integrative perspective on the ethical and practical challenges. As researchers consider data augmentation in mental health research, it is critical to evaluate the promise through rigorous methodologies and research and decide whether 'to augment or not to augment'.As AI and ML algorithms have advanced exponentially in recent years, one of the most prominent limiting factors remains the availability of representative training data that determines model performance. 3 In contrast to synthetic data that creates data from scratch, data augmentation is an ML technique used to create new data based on existing data points, thereby artificially expanding a dataset. There are several data augmentation methods, with some incorporating simple transformations to text data (rotating images by random degrees, flipping images horizontally, and back-translation of data to a new language). Expanding on this, generative adversarial network (GAN) based augmentation is a more sophisticated strategy that uses neural networks to create novel samples from a pre-existing dataset. For example, GANs can augment data for chest X-rays that not only improve classification accuracy but perform better than other simple transformation methods. 4 Large language models (LLMs), such as GPT-4o, have also been used for clinical transcript data augmentation. 5 With various strategies available, data augmentation can be a potential tool for all fields of medical research moving forward. Augmentation can address class imbalance while preserving anonymity, facilitating cross-lingual and robust mental health research with available data. While data augmentation may enhance model generalizability and facilitate new research, mental health research introduces concerns about augmentation because of unique challenges in balancing realism and mitigating biases.To Augment: Overcoming Data Scarceness Data scarcity remains a significant challenge in mental health research. Unlike other areas of medicine that can evaluate objective data from available biomarkers and imaging, mental health research relies on qualitative interviews, self-reported surveys, questionnaires, and clinical notes. The subjective nature of mental health concepts, such as emotional well-being, 6 also makes developing universally accepted definitions challenging. Despite self-reported measurements being cost-efficient, flexible, and valuable for uncovering personal perceptions, 7 many datasets do not provide the comprehensive, diverse, and sufficient data necessary for generalizable and reliable research. Furthermore, data collection is hindered by high costs, privacy concerns, stigma, and recruitment difficulties. Augmented data presents a promising opportunity to address these issues. By artificially generating new data, such as augmented text or audio, researchers can increase usable data, mitigate the concerns of dependency on subjective reports of experiences, and enhance the scalability of mental health studies. 8 Data augmentation is a cost-effective alternative to collecting new clinical data, reducing the reliance on expensive longitudinal studies. By using augmented data, researchers may limit the reliance on personally identifiable information, enhancing privacy protection. As well, researchers can instead focus efforts on generating new insights and testing hypotheses using readily available datasets. Mental health datasets are often highly imbalanced, with certain conditions underrepresented (e.g., borderline personality disorder) compared to others (e.g., depression), and gender disparities in diagnosis, treatment, and research. Rare mental health disorders can present with uncommon symptoms, which can complicate diagnoses. 9 Moreover, certain populations-such as children, seniors, racial minorities, LGBTQ2+, and marginalized groups-are also underrepresented in datasets. This imbalance can lead to biased conclusions and unreliable predictive models, which can perpetuate disparities and further marginalize underserved populations. Addressing these issues, data augmentation can create more balanced datasets by artificially increasing the representation of minority classes, allowing ML models to better detect and treat underrepresented conditions and populations. AI-generated data can impute missing information and ensure datasets are more diverse, leading to more inclusive and equitable models. For instance, psychiatric symptoms often manifest differently across age groups and genders, with adolescents and adults experiencing distinct presentations of similar conditions. 10 Augmented data allows for a better representation of subgroups, which can enhance diagnostic accuracy and treatment outcomes. Augmentation is also crucial in scaling AI models. Introducing synthetic variations, such as noise injection, makes models more robust and less prone to overfitting. This increased variability enables models to learn general patterns rather than memorizing specific instances, thus improving their generalizability across different patient groups. This is particularly beneficial in mental health research, where there is significant variability in behavior and emotions. For example, consider a research team studying depressive disorders in a population skewed towards high symptom severity levels. An ML model trained on this real-world data may not predict accurate outcomes when applied to patient groups with lower symptom severity levels. 11 However, researchers can achieve more accurate and generalizable predictions by generating augmented data that mimics these underrepresented cases. 12 Incorporating data augmentation could improve research and clinical practice outcomes, allowing decision-support tools to be developed, and offering more equitable recommendations. Not To Augment: Bias and Clinician Fidelity Mental health data is nuanced and profoundly contextual, with small variations in symptoms or patient perceptions potentially leading to different clinical outcomes. This type of data has multifactorial and complicated biological, psychological, and social components. One of the most significant risks of data augmentation is replicating and potentially amplifying biases present in the original datasets. If the original dataset underrepresents certain cultural, gender, or ethnic groups, these biases may be further embedded into the model. Poorly designed augmented data, if not inspected by mental health professionals, may fail to respect the nuanced interplay of different symptomatology and can mistakenly intensify biases present in the original dataset, which may also introduce new biases. 13,14 In mental health research, historical biases regarding race, gender, and socioeconomic status are well-documented and must be mitigated. 15 Creating augmented data may risk the loss of meaning, especially when nuanced cultural and individual differences are simplified. This could lead to generalized stereotypes or poor representation of complex mental health experiences. 16 Augmented data may fail to consider the complexity of identities intersecting such factors, which may result in inaccurate predictions, leading to inconsistencies in treatment recommendations. This is especially problematic in mental health, where symptoms and coping mechanisms can vary greatly across cultures due to differences in language, values, and stigma around mental illness. Augmented data generated without consideration of cultural contexts might promote the development of AI models that misinterpret the mental health challenges of underrepresented populations. Traditional augmentation techniques may treat diverse groups as homogeneous, reducing cultural and ethnic variability to a few representative data points, thus risking generalization and misrepresentation. Moreover, augmented data may lack clinical expertise and the ability to reproduce real-world patient behavior and presentation. 17 Augmented data may oversimplify the variability that clinicians rely on for diagnosis and treatment. Adding synthetic noise or random data augmentation may alter key data features, causing a loss of context crucial to understanding mental health conditions. For example, AI-generated text transcripts of patient interviews might lack the subtle linguistic cues and emotional context necessary for a clinician's judgment. 18 This disconnect could result in models that appear highly accurate in theory but fail to translate into reliable real-world clinical support. Evaluating the quality of augmented data is particularly challenging in mental health due to the subjective nature of psychological assessments and a lack of consistent validation benchmarks.Given the different perspectives in this argument, how should mental health research proceed with augmented data? The key is cautious optimism. While augmented data should not be dismissed outright, it must be integrated with real-world data in a way that preserves transparency and mitigates bias. One approach is to utilize augmented data to supplement rather than replace real-world data. Combining both traditional and augmented datasets can enhance the dataset diversity without over-relying on synthetic information. Models trained on augmented data should also be evaluated by mental health professionals with standard accuracy metrics and qualitative appraisals. This will help ensure that augmented data-driven predictions align with clinical expertise and judgment. To integrate augmented data into mental health research effectively, researchers should prioritize pilot and feasibility studies to assess its practicality and ensure alignment with clinical expertise. Collaborative efforts based on these findings can also address challenges related to bias, equity, and implementation. Societal and cultural norms heavily influence how mental health symptoms are expressed, understood, and treated. For example, some cultures may emphasize physical symptoms like headaches, while others focus on emotional or behavioural aspects. 19 Data augmentation preserves the distributional properties of the original dataset, including those with small sample sizes, imbalances, or underrepresented features. By enhancing the diversity and representation within the dataset, models trained on well-augmented data are more likely to generalize effectively and exhibit reduced bias. Importantly, if cultural nuances are present in the original data but captured unevenly, data augmentation can help balance representation. This allows the model to better generalize across cultural subtleties, improving its fairness and applicability. Incorporating ethnographic insights or consulting cultural experts during data creation can further improve augmented data's realism and applicability. 20 The financial implications of data augmentation are an important consideration in promoting global health equity. Researchers in wealthier regions often have greater access to the tools and funding, potentially exacerbating inequalities. 21 In contrast, researchers in underserved areas may face significant barriers to adopting these technologies. Open Science initiatives could help promote the sharing of augmented datasets and tools, enabling broader access. 21,22 Publicly available platforms can democratize research opportunities, while transparency protocols requiring researchers to disclose their augmentation methods could foster collaboration and reduce disparities. By addressing these financial and equity concerns, the benefits of augmented data can be distributed more equitably across research communities. 22 Finally, the ethical implications of using augmented data in mental health research must not be overlooked. Augmented data can mitigate privacy concerns however, generating realistic patient data raises questions about consent and transparency. In addition, ethicists will need to develop clear guidelines for using augmented data in healthcare AI models that align with clinicians' preferred practices and optimize patient confidentiality. Frameworks that promote positive clinician-AI interactions can ensure that AI data-driven models undertake the same rigorous inspection as models based on real-world data and be successfully implemented in clinical settings. 23,24 The use of augmented data in mental health research is an exciting frontier, contributing to the potential to overcome long-standing challenges of data scarcity and imbalance. Data augmentation has been demonstrated to be a useful tool in other medical fields, such as medical imaging. However, introducing augmented data into the mental health field must be handled with caution. While the promise of enhanced model performance and data diversity is desirable, the risks of bias, unreliability, and ethical concerns may limit feasibility.We extend our gratitude to the researchers and clinicians whose foundational work on data augmentation and mental health inspired this commentary. Special thanks to the Interventional Psychiatry Program lab members at St. Michael's Hospital for their valuable insights and feedback. Vector Institute, Toronto Metropolitan University | Publication | 2025-02-18 | Argyrios Perivolaris, Alice Rueda, Karisa Parkington, Apurv Soni, Sirisha Rambhatla, Samavi, R., Rakesh Jetly, Andrew J Greenshaw, Yanbo Zhang, Bo Cao, Sridhar Krishnan, Hilary Pang |
| Opportunities and Barriers of Generative Artificial Intelligence in the Training of Psychiatrists: A Competencies-Based Perspective Vector Institute, Toronto Metropolitan University, University of Toronto | Publication | 2025-02-01 | Hilary Y M Pang, Shakila Meshkat, Bazen Gashaw Teferra, Alice Rueda, Samavi, R., Sri Krishnan, Thomas Doyle, Sirisha Rambhatla, Sandra DeJong, Sanjeev Sockalingam, Tanya Horsley, Brian Hodges, Bhat, V. |
| Reimagining psychiatric care with agentic AI: promise, challenges, and a roadmap forwardAgentic artificial intelligence (AI) represents a pivotal shift in clinical decision support, moving beyond static tools by reasoning, adapting, and acting alongside clinicians. Psychiatry, grounded in subjective experience, trust, and longitudinal care, offers both an opportunity and a high-stakes testbed. Agentic systems may enhance documentation, personalize care, support continuous monitoring, and extend access, while raising risks around bias, explainability, privacy, and therapeutic alliance. In this Perspective, we (i) define psychiatry-specific agentic AI distinct from decision-support and fully autonomous systems; (ii) synthesize current evidence across studies; (iii) propose assistive, collaborative, and semi-autonomous roles; and (iv) outline a roadmap for responsible implementation. McMaster University, Toronto Metropolitan University, University of Toronto | Publication | 2026-02-01 | Divya Sharma, Shakila Meshkat, Argyrios Perivolaris, Mohammad Amin Kamaleddin, Bazen Gashaw Teferra, Alice Rueda, Samavi, R., Rakesh Jetly, Vijay Mago, Yuqi Wu, Yanbo Zhang, Bo Cao, Andrew Greenshaw, Sri Krishnan, Bhat, V. |
| Superposition as Lossy Compression \textemdash Measure with Sparse Autoencoders and Connect to Adversarial Vulnerability Vector Institute | Publication | 2025-01-01 | Leonard Bereska, Zoe Tzifa-Kratira, Samavi, R., Stratis Gavves |
| Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the AnswersLarge language models (LLMs) have demonstrated remarkable capabilities in natural language understanding, reasoning, and problem-solving across various domains. However, their ability to perform complex, multi-step reasoning task-essential for applications in science, medicine, and law-remains an area of active investigation. This paper examines the reasoning capabilities of contemporary LLMs, analyzing their strengths, limitations, and potential for improvement. The study uses prompt engineering techniques on the Graduate-Level GoogleProof Q&A (GPQA) dataset to assess the scientific reasoning of GPT-4o. Five popular prompt engineering techniques and two tailored promptings were tested: baseline direct answer (zero-shot), chain-of-thought (CoT), zero-shot CoT, self-ask, self-consistency, decomposition, and multipath promptings. Our findings indicate that while LLMs exhibit emergent reasoning abilities, they often rely on pattern recognition rather than true logical inference, leading to inconsistencies in complex problem-solving. The results indicated that self-consistency outperformed the other prompt engineering technique with an accuracy of 52.99%, followed by direct answer (52.23%). Zero-shot CoT (50%) outperformed multipath (48.44%), decomposition (47.77%), self-ask (46.88%), and CoT (43.75%). Self-consistency performed the second worst in explaining the answers. Simple techniques such as direct answer, CoT, and zero-shot CoT have the best scientific reasoning. We propose a research agenda aimed at bridging these gaps by integrating structured reasoning frameworks, hybrid AI approaches, and human-in-the-loop methodologies. By critically evaluating the reasoning mechanisms of LLMs, this paper contributes to the ongoing discourse on the future of artificial general intelligence and the development of more robust, trustworthy AI systems. Vector Institute, McMaster University, Toronto Metropolitan University, University of Toronto | Publication | 2025-07-25 | Alice Rueda, Mohammed S Hassan, Argyrios Perivolaris, Bazen G Teferra, Samavi, R., Sirisha Rambhatla, Yuqi Wu, Yanbo Zhang, Bo Cao, Divya Sharma, Sridhar Krishnan, Bhat, V. |